Add support for the atom chunk encoding in the upcoming Erlang 28#32
Open
tomas-abrahamsson wants to merge 5 commits intoklajo:mainfrom
Open
Add support for the atom chunk encoding in the upcoming Erlang 28#32tomas-abrahamsson wants to merge 5 commits intoklajo:mainfrom
tomas-abrahamsson wants to merge 5 commits intoklajo:mainfrom
Conversation
Contributor
Author
|
Erlang 28.0 was released yesterday, so I added another commit to the top of this PR to include it in the github workflow. |
Update supported versions in README.md
In Erlang 20, it was replaced by the AtU8 chunk to support unicode atoms.
Use the beam_lib module to process chunks, this takes care of size calculations, padding and more for us. Since we now use beam_lib to process chunks, drop the documentation of the beam file format, as we no longer use this info. Replace it with a link to the beam file format.
The length of atoms changes in Erlang 28: Atom lengths of up to 15 fit in one octet, but longer lengths require 2 octets. Support both old and new formats.
Update supported versions in README.md
46ab6f2 to
b0f38d2
Compare
Contributor
Author
|
I had forgotten to update the badge of supported versions in the README.md in the two commits that bump the github workflow OTP versions. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In Erlang 27 and earlier, the atoms in the AtU8 chunk are encoded as one byte of length, followed by the UTF-8 encoding of the atom text.
In Erlang 28, this is changed, so the length is either one or two bytes. The encoding of the length is a bit involved, and it seems this is due to reusing of a scheme originally introduced when the JIT was added, but it makes it possible to specify the length of the atom even when it consists of code-points that encode to several bytes in UTF-8. In Erlang 28, the id of the chunk is still
AtU8but the number of atoms is negative instead of positive.I made some more changes:
The github action adds Erlang 27 and drops Erlang 23
I dropped the code to handle latin-1 encoded atoms when the chunk id is
Atom. TheAtU8chunk for UTF-8 encoded atoms was introduced in Erlang 20. I don't know when Erlang dropped support for the old chunk, but in Erlang 27, support was dropped to read beam files from 24 and older, if I remember right.Instead of constructing the chunks and the top-level form with updated chunk lengths and paddings, I changed to use
beam_lib:build_modulewhich has been around since Erlang 18. So the code now only needs to concern itself with the atom chunk. (As a side note, I foundbeam_libfunctionality only for getting a list of atoms from the atom chunk,beam_lib:chunks(Beam,[atoms]), but no function to make a chunk from a list of atoms, so the code still needs to know the encoding of the atom chunk.)Since the code now uses beam_lib to traverse chunks and put it back together to a module, there is no longer any big reason to document the beam file format. My intention was to replace it with a link to the documentation of it, but I could not find it anywhere. Eventually, I linked to the wayback machine.